An Empirical Method Exploring a Large Set of Features for Authorship Identification

نویسندگان

  • Seifeddine Mechti
  • Maher Jaoua
  • Rim Faiz
  • Lamia Hadrich Belguith
چکیده

In this paper, we deal with the author identification issues of the document whose origin is unknown. To overcome these problems, we propose a new hybrid approach combining the statistical and stylistic analysis. Our introduced method is based on determining the lexical and syntactic features of the written text in order to identify the author of the document. These features are explored to build a machine learning process. We obtained promising results by relying on PAN@CLEF2014 English literature corpus. The experimental results are comparable to those obtained by the best state of the art methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Rough Set Theory for Decision Making of rehabilitation Method for Concrete Pavement

In recent years a great number of advanced theoretical - empirical methods has been developed for design & modeling concrete pavements distress. But there is no reliable theoretical method to be use in evaluation of conerete pavements distresses and making a decision about repairing them. Only empirical methods is used for this reason. One of the most usual methods in evaluating concrete paveme...

متن کامل

CEAI: CCM based Email Authorship Identification Model

In this paper we present a model for email authorship identification (EAI) by employing a Cluster-based Classification (CCM) technique. Traditionally, stylometric features have been successfully employed in various authorship analysis tasks; we extend the traditional feature-set to include some more interesting and effective features for email authorship identification (e.g. the last punctuatio...

متن کامل

A General Investigation on the Combination of Local and Global Feature Selection Methods for Request Identification in Telegram

Nowadays, the use of various messaging services is expanding worldwide with the rapid development of Internet technologies. Telegram is a cloud-based open-source text messaging service. According to the US Securities and Exchange Commission and based on the statistics given for October 2019 to present, 300 million people worldwide used telegram per month. Telegram users are more concentrated in...

متن کامل

An Analysis Framework for Hybrid Authorship Verification

Given a set of candidate authors for whom some texts of undisputed authorship exist, attribute texts of unknown authorship to one of the candidates is called Author verification. This problem acquired great attention due to its new applications in forensic analysis, e-commerce and plagiarism detection. The author verification task is of great help in the plagiarism detection process. Indeed, th...

متن کامل

Modified signed log-likelihood test for the coefficient of variation of an inverse Gaussian population

In this paper, we consider the problem of two sided hypothesis testing for the parameter of coefficient of variation of an inverse Gaussian population. An approach used here is the modified signed log-likelihood ratio (MSLR) method which is the modification of traditional signed log-likelihood ratio test. Previous works show that this proposed method has third-order accuracy whereas the traditi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016